{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (c) Microsoft Corporation. All rights reserved.\n",
    "\n",
    "Licensed under the MIT License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extractive Summarization on CNN/DM Dataset using Transformer Version of BertSum\n",
    "\n",
    "\n",
    "### Summary\n",
    "\n",
    "This notebook demonstrates how to fine tune Transformers for extractive text summarization. Utility functions and classes in the NLP Best Practices repo are used to facilitate data preprocessing, model training, model scoring, result postprocessing, and model evaluation.\n",
    "\n",
    "BertSum refers to  [Fine-tune BERT for Extractive Summarization](https://arxiv.org/pdf/1903.10318.pdf) with [published example](https://github.com/nlpyang/BertSum/). And the Transformer version of Bertsum refers to our modification of BertSum and the source code can be accessed at (https://github.com/daden-ms/BertSum/). \n",
    "\n",
    "Extractive summarization are usually used in document summarization where each input document consists of mutiple sentences. The preprocessing of the input training data involves assigning label 0 or 1 to the document sentences based on the give summary. The summarization problem is also simplfied to classifying whether a document sentence should be included in the summary. \n",
    "\n",
    "The figure below illustrates how BERTSum can be fine tuned for extractive summarization task. [CLS] token is inserted at the beginning of each sentence, so is [SEP] token at the end. Interval segment embedding and positional embedding are added upon the token embedding as the input of the BERT model. The [CLS] token representation is used as sentence embedding and only the [CLS] tokens are used as the input for the summarization model. The summarization layer predicts the probability for each  sentence being included in the summary. Techniques like trigram blocking can be used to improve model accuarcy.   \n",
    "\n",
    "<img src=\"https://nlpbp.blob.core.windows.net/images/BertSum.PNG\">\n",
    "\n",
    "\n",
    "### Before You Start\n",
    "\n",
    "The running time shown in this notebook is on a Standard_NC24s_v3 Azure Ubuntu Virtual Machine with 4 NVIDIA Tesla V100 GPUs. \n",
    "> **Tip**: If you want to run through the notebook quickly, you can set the **`QUICK_RUN`** flag in the cell below to **`True`** to run the notebook on a small subset of the data and a smaller number of epochs. \n",
    "\n",
    "Using only 1 NVIDIA Tesla V100 GPUs, 16GB GPU memory configuration,\n",
    "- for data preprocessing, it takes around 1 minutes to preprocess the data for quick run. Otherwise it takes ~20 minutes to finish the data preprocessing. This time estimation assumes that the chosen transformer model is \"distilbert-base-uncased\" and the sentence selection method is \"greedy\", which is the default. The preprocessing time can be significantly longer if the sentence selection method is \"combination\", which can achieve better model performance.\n",
    "\n",
    "- for model fine tuning, it takes around 2 minutes for quick run. Otherwise, it takes around ~3 hours to finish. This estimation assumes the chosen encoder method is \"transformer\". The model fine tuning time can be shorter if other encoder method is chosen, which may result in worse model performance. \n",
    "\n",
    "### Additional Notes\n",
    "\n",
    "* **ROUGE Evalation**: To run rouge evaluation, please refer to the section of compute_rouge_perl in [summarization_evaluation.ipynb](./summarization_evaluation.ipynb) for setup.\n",
    "\n",
    "* **Distributed Training**:\n",
    "Please note that the jupyter notebook only allows to use pytorch [DataParallel](https://pytorch.org/docs/master/nn.html#dataparallel). Faster speed and larger batch size can be achieved with pytorch [DistributedDataParallel](https://pytorch.org/docs/master/notes/ddp.html)(DDP). Script [extractive_summarization_cnndm_distributed_train.py](./extractive_summarization_cnndm_distributed_train.py) shows an example of how to use DDP.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "## Set QUICK_RUN = True to run the notebook on a small subset of data and a smaller number of epochs.\n",
    "QUICK_RUN = True\n",
    "## Set USE_PREPROCSSED_DATA = True to skip the data preprocessing\n",
    "USE_PREPROCSSED_DATA = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configuration\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import shutil\n",
    "import sys\n",
    "from tempfile import TemporaryDirectory\n",
    "import torch\n",
    "\n",
    "nlp_path = os.path.abspath(\"../../\")\n",
    "if nlp_path not in sys.path:\n",
    "    sys.path.insert(0, nlp_path)\n",
    "\n",
    "from utils_nlp.dataset.cnndm import CNNDMBertSumProcessedData, CNNDMSummarizationDataset\n",
    "from utils_nlp.eval import compute_rouge_python, compute_rouge_perl\n",
    "from utils_nlp.models.transformers.extractive_summarization import (\n",
    "    ExtractiveSummarizer,\n",
    "    ExtSumProcessedData,\n",
    "    ExtSumProcessor,\n",
    ")\n",
    "\n",
    "from utils_nlp.models.transformers.datasets import SummarizationDataset\n",
    "import nltk\n",
    "from nltk import tokenize\n",
    "\n",
    "import pandas as pd\n",
    "import scrapbook as sb\n",
    "import pprint"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Configuration: choose the transformer model to be used"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Several pretrained models have been made available by [Hugging Face](https://github.com/huggingface/transformers). For extractive summarization, the following pretrained models are supported. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame({\"model_name\": ExtractiveSummarizer.list_supported_models()})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "# Transformer model being used\n",
    "MODEL_NAME = \"distilbert-base-uncased\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# notebook parameters\n",
    "# the cache data path during find tuning\n",
    "CACHE_DIR = TemporaryDirectory().name\n",
    "processor = ExtSumProcessor(model_name=MODEL_NAME, cache_dir=CACHE_DIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data Preprocessing\n",
    "\n",
    "The dataset we used for this notebook is CNN/DM dataset which contains the documents and accompanying questions from the news articles of CNN and Daily mail. The highlights in each article are used as summary. The dataset consits of ~289K training examples, ~11K valiation examples and ~11K test examples.  You can choose the [Option 1] below preprocess the data or [Option 2] to use the preprocessed version at [BERTSum published example](https://github.com/nlpyang/BertSum/). You don't need to manually download any of these two data sets as the code below will handle downloading. Functions defined specific in [cnndm.py](../../utils_nlp/dataset/cnndm.py) are unique to CNN/DM dataset that's preprocessed by harvardnlp. However, it provides a skeleton of how to preprocessing text into the format that model preprocessor takes: sentence tokenization and work tokenization. \n",
    "\n",
    "##### Details of Data Preprocessing\n",
    "\n",
    "The purpose of preprocessing is to process the input articles to the format that model finetuning needed. Assuming you have (1) all articles and (2) target summaries, each in a file and line-breaker separated, the steps to preprocess the data are:\n",
    "1. sentence tokenization\n",
    "2. word tokenization\n",
    "3. **label** the sentences in the article with 1 meaning the sentence is selected and 0 meaning the sentence is not selected. The algorithms for the sentence selection are \"greedy\" and \"combination\" and can be found in [sentence_selection.py](../../utils_nlp/dataset/sentence_selection.py)\n",
    "3. convert each example to  the desired format for extractive summarization\n",
    "    - filter the sentences in the example based on the min_src_ntokens argument. If the lefted total sentence number is less than min_nsents, the example is discarded.\n",
    "    - truncate the sentences in the example if the length is greater than max_src_ntokens\n",
    "    - truncate the sentences in the example and the labels if the total number of sentences is greater than max_nsents\n",
    "    - [CLS] and [SEP] are inserted before and after each sentence\n",
    "    - wordPiece tokenization or Byte Pair Encoding (BPE) subword tokenization\n",
    "    - truncate the example to 512 tokens\n",
    "    - convert the tokens into token indices corresponding to the transformer tokenizer's vocabulary.\n",
    "    - segment ids are generated and added\n",
    "    - [CLS] token positions are logged\n",
    "    - [CLS] token labels are truncated if it's greater than 512, which is the maximum input length that can be taken by the transformer model.\n",
    "    \n",
    "    \n",
    "Note that the original BERTSum paper use Stanford CoreNLP for data preprocessing, here we use NLTK for data preprocessing. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### [Option 1] Preprocess  data (Please skil this part if you choose to use preprocessed data)\n",
    "The code in following cell will download the CNN/DM dataset listed at https://github.com/harvardnlp/sent-summary/."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "# the data path used to save the downloaded data file\n",
    "DATA_PATH = TemporaryDirectory().name\n",
    "# The number of lines at the head of data file used for preprocessing. -1 means all the lines.\n",
    "TOP_N = 1000\n",
    "if not QUICK_RUN:\n",
    "    TOP_N = -1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "train_dataset, test_dataset = CNNDMSummarizationDataset(top_n=TOP_N, local_cache_path=DATA_PATH)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Preprocess the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "\n",
    "ext_sum_train = processor.preprocess(train_dataset, oracle_mode=\"greedy\")\n",
    "ext_sum_test = processor.preprocess(test_dataset, oracle_mode=\"greedy\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "# save and load preprocessed data\n",
    "save_path = os.path.join(DATA_PATH, \"processed\")\n",
    "torch.save(ext_sum_train, os.path.join(save_path, \"train_full.pt\"))\n",
    "torch.save(ext_sum_test, os.path.join(save_path, \"test_full.pt\"))\n",
    "\n",
    "\"\"\"\n",
    "# ext_sum_train = torch.load(os.path.join(save_path, \"train_full.pt\"))\n",
    "# ext_sum_test = torch.load(os.path.join(save_path, \"test_full.pt\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(ext_sum_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(ext_sum_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Inspect Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ext_sum_train[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "ext_sum_train[0].keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### [Option 2] Reuse Preprocessed  data from [BERTSUM Repo](https://github.com/nlpyang/BertSum)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "parameters",
     ":w"
    ]
   },
   "outputs": [],
   "source": [
    "# the data path used to downloaded the preprocessed data from BERTSUM Repo.\n",
    "# if you have downloaded the dataset, change the code to use that path where the dataset is.\n",
    "PROCESSED_DATA_PATH = TemporaryDirectory().name\n",
    "os.makedirs(PROCESSED_DATA_PATH, exist_ok=True)\n",
    "#data_path = \"./temp_data5/\"\n",
    "#PROCESSED_DATA_PATH = data_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if USE_PREPROCSSED_DATA:\n",
    "    download_path = CNNDMBertSumProcessedData.download(local_path=PROCESSED_DATA_PATH)\n",
    "    ext_sum_train, ext_sum_test = ExtSumProcessedData().splits(root=download_path, train_iterable=True)\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model training\n",
    "To start model training, we need to create a instance of ExtractiveSummarizer.\n",
    "#### Choose the transformer model.\n",
    "Currently ExtractiveSummarizer support two models:\n",
    "- distilbert-base-uncase, \n",
    "- bert-base-uncase\n",
    "\n",
    "Potentionally, roberta-based model and xlnet can be supported but needs to be tested.\n",
    "#### Choose the encoder algorithm.\n",
    "There are four options:\n",
    "- baseline: it used a smaller transformer model to replace the bert model and with transformer summarization layer\n",
    "- classifier: it uses pretrained BERT and fine-tune BERT with **simple logistic classification** summarization layer\n",
    "- transformer: it uses pretrained BERT and fine-tune BERT with **transformer** summarization layer\n",
    "- RNN: it uses pretrained BERT and fine-tune BERT with **LSTM** summarization layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "BATCH_SIZE = 5 # batch size, unit is the number of samples\n",
    "MAX_POS_LENGTH = 512\n",
    "if USE_PREPROCSSED_DATA: #if bertsum published data is used\n",
    "    BATCH_SIZE = 3000 # batch size, unit is the number of tokens\n",
    "    MAX_POS_LENGTH = 512\n",
    "    \n",
    "\n",
    "\n",
    "# GPU used for training\n",
    "NUM_GPUS = torch.cuda.device_count()\n",
    "\n",
    "# Encoder name. Options are: 1. baseline, classifier, transformer, rnn.\n",
    "ENCODER = \"transformer\"\n",
    "\n",
    "# Learning rate\n",
    "LEARNING_RATE=2e-3\n",
    "\n",
    "# How often the statistics reports show up in training, unit is step.\n",
    "REPORT_EVERY=100\n",
    "\n",
    "# total number of steps for training\n",
    "MAX_STEPS=1e2\n",
    "# number of steps for warm up\n",
    "WARMUP_STEPS=5e2\n",
    "    \n",
    "if not QUICK_RUN:\n",
    "    MAX_STEPS=5e4\n",
    "    WARMUP_STEPS=5e3\n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "summarizer = ExtractiveSummarizer(processor, MODEL_NAME, ENCODER, MAX_POS_LENGTH, CACHE_DIR)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#\"\"\"\n",
    "\n",
    "summarizer.fit(\n",
    "            ext_sum_train,\n",
    "            num_gpus=NUM_GPUS,\n",
    "            batch_size=BATCH_SIZE,\n",
    "            gradient_accumulation_steps=2,\n",
    "            max_steps=MAX_STEPS,\n",
    "            learning_rate=LEARNING_RATE,\n",
    "            warmup_steps=WARMUP_STEPS,\n",
    "            verbose=True,\n",
    "            report_every=REPORT_EVERY,\n",
    "            clip_grad_norm=False,\n",
    "            use_preprocessed_data=USE_PREPROCSSED_DATA\n",
    "        )\n",
    "\n",
    "#\"\"\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "summarizer.save_model(\n",
    "    os.path.join(\n",
    "        CACHE_DIR,\n",
    "        \"extsum_modelname_{0}_usepreprocess{1}_steps_{2}.pt\".format(\n",
    "            MODEL_NAME, USE_PREPROCSSED_DATA, MAX_STEPS\n",
    "        ),\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# for loading a previous saved model\n",
    "\"\"\"\n",
    "import torch\n",
    "model_path = os.path.join(\n",
    "        CACHE_DIR,\n",
    "        \"extsum_modelname_{0}_usepreprocess{1}_steps_{2}.pt\".format(\n",
    "            MODEL_NAME, USE_PREPROCSSED_DATA, MAX_STEPS\n",
    "        ))\n",
    "summarizer = ExtractiveSummarizer(processor, MODEL_NAME, ENCODER, MAX_POS_LENGTH, CACHE_DIR)\n",
    "summarizer.model.load_state_dict(torch.load(model_path, map_location=\"cpu\"))\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model Evaluation\n",
    "\n",
    "[ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)), or Recall-Oriented Understudy for Gisting Evaluation has been commonly used for evaluating text summarization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ext_sum_test[0].keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if \"segs\" in ext_sum_test[0]: # preprocessed_data\n",
    "    source = [i['src_txt'] for i in ext_sum_test]\n",
    "    target = [\"\\n\".join(i['tgt_txt'].split(\"<q>\")) for i in ext_sum_test]\n",
    "else:\n",
    "    source = []\n",
    "    temp_target = []\n",
    "    for i in ext_sum_test:\n",
    "        source.append(i[\"src_txt\"]) \n",
    "        temp_target.append(\" \".join(j) for j in i['tgt']) \n",
    "    target = [''.join(i) for i in list(temp_target)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "sentence_separator = \"\\n\"\n",
    "prediction = summarizer.predict(ext_sum_test, num_gpus=NUM_GPUS, batch_size=256, sentence_separator=sentence_separator)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(prediction)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rouge_scores = compute_rouge_python(cand=prediction, ref=target)\n",
    "pprint.pprint(rouge_scores)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "target[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prediction[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "source[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# for testing\n",
    "sb.glue(\"rouge_2_f_score\", rouge_scores['rouge-2']['f'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prediction on a single input sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "source = \"\"\"\n",
    "But under the new rule, set to be announced in the next 48 hours, Border Patrol agents would immediately return anyone to Mexico — without any detainment and without any due process — who attempts to cross the southwestern border between the legal ports of entry. The person would not be held for any length of time in an American facility.\n",
    "\n",
    "Although they advised that details could change before the announcement, administration officials said the measure was needed to avert what they fear could be a systemwide outbreak of the coronavirus inside detention facilities along the border. Such an outbreak could spread quickly through the immigrant population and could infect large numbers of Border Patrol agents, leaving the southwestern border defenses weakened, the officials argued.\n",
    "The Trump administration plans to immediately turn back all asylum seekers and other foreigners attempting to enter the United States from Mexico illegally, saying the nation cannot risk allowing the coronavirus to spread through detention facilities and Border Patrol agents, four administration officials said.\n",
    "The administration officials said the ports of entry would remain open to American citizens, green-card holders and foreigners with proper documentation. Some foreigners would be blocked, including Europeans currently subject to earlier travel restrictions imposed by the administration. The points of entry will also be open to commercial traffic.\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_dataset = SummarizationDataset(\n",
    "    None,\n",
    "    source=[source],\n",
    "    source_preprocessing=[tokenize.sent_tokenize],\n",
    "    word_tokenize=nltk.word_tokenize,\n",
    ")\n",
    "processor = ExtSumProcessor(model_name=MODEL_NAME,  cache_dir=CACHE_DIR)\n",
    "preprocessed_dataset = processor.preprocess(test_dataset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "preprocessed_dataset[0].keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prediction = summarizer.predict(preprocessed_dataset, num_gpus=0, batch_size=1, sentence_separator=\"\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prediction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean up temporary folders"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if os.path.exists(DATA_PATH):\n",
    "    shutil.rmtree(DATA_PATH, ignore_errors=True)\n",
    "if os.path.exists(CACHE_DIR):\n",
    "    shutil.rmtree(CACHE_DIR, ignore_errors=True)\n",
    "if USE_PREPROCSSED_DATA:\n",
    "    if os.path.exists(PROCESSED_DATA_PATH):\n",
    "        shutil.rmtree(PROCESSED_DATA_PATH, ignore_errors=True)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python (nlp_gpu)",
   "language": "python",
   "name": "nlp_gpu"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
